-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On read, change order of dealing with redist vs undist #169
Conversation
The code has not properly handled the read of a Field or Array with both ungridded / undistributed dimensions and > 1 DE per PET. It appears that this is due to an incorrect ordering of dealing with these two situations in IO::read. This commit changes that ordering to resolve this issue. Note that I also needed to reassign temp_array_p to temp_array_undist_p before doing the redistStore and redist; without this, the redistStore call generated an error with this trace-back: 20230724 170325.547 ERROR PET3 ESMCI_Array.C:6238 ESMCI::Array::tRedistStore() Invalid argument - srcArray and dstArray must provide identical number of exclusive elements 20230724 170325.547 ERROR PET3 ESMCI_Array.C:6018 ESMCI::Array::redistStore() Invalid argument - Internal subroutine call returned Error 20230724 170325.547 ERROR PET3 ESMCI_IO.C:387 ESMCI::IO::read() Invalid argument - Internal subroutine call returned Error 20230724 170325.547 ERROR PET3 ESMCI_IO.C:282 ESMCI::IO::read() Invalid argument - Internal subroutine call returned Error 20230724 170325.548 ERROR PET3 ESMCI_IO_F.C:210 c_esmc_ioread() Unable to read from file - Internal subroutine call returned Error 20230724 170325.548 ERROR PET3 ESMF_IO.F90:397 ESMF_IOAddArray() Unable to read from file - Internal subroutine call returned Error 20230724 170325.548 ERROR PET3 ESMF_FieldPr.F90:336 ESMF_FieldRead Unable to read from file - Internal subroutine call returned Error 20230724 170325.548 ERROR PET3 esmApp.F90:185 Unable to read from file - Passing error in return code Resolves #150
@billsacks - Just a quick question with regards of this PR and 8.5.0. I see that this is work on |
Right - I do not intend for this to go into 8.5.0. Thanks for confirming. |
This tests a case that was failing until the work on this branch: each of these situations was tested individually, but the combination wasn't tested and wasn't working.
I just pushed a commit that adds unit tests for this previously-failing case. This feels like a lot of test code for testing a single obscure scenario, but it kind of feels like that's just the way it has to be... though I'm open to feedback. (I could have shortened it a bit by looping over the two DEs per PET, but I kept it this way for consistency with some other code in this test module.) |
@billsacks - commenting on your point (1) from above. Your outline of steps taken does appear correct to me. Your hope to Redist() directly from |
As for the new test code, the main thing is that the multi-DE + undistr dim issue is being tested. I did not do a thorough review, but from what I looked at it seem good to me. Test code in my experience often does get bulky, but it's not necessarily a bad thing. |
Thanks a lot for looking at this @theurich ! I like your suggestion; I'll make that change. |
The code has not properly handled the read of a Field or Array with both ungridded / undistributed dimensions and > 1 DE per PET. It appears that this is due to an incorrect ordering of dealing with these two situations in IO::read. This commit changes that ordering to resolve this issue.
Note that I also needed to reassign temp_array_p to temp_array_undist_p before doing the redistStore and redist; without this, the redistStore call generated an error with this trace-back:
Resolves #150
I have verified that a round-trip write-then-read of a single-tile Field with both ungridded dimensions and 0 or > 1 DE per PET on some PETs gives identical results to the original Field.
@theurich and/or @oehmke - I would like your input on two things here (I am assigning you both as reviewers, but input from one of you is probably sufficient):
(1) I would like a second set of eyes / brain on the logic here to confirm that this flow of logic looks reasonable. Specifically, I want confirmation that the logic around the redistStore call looks reasonable. At a high level, for an Array with both undistributed dimensions and > 1 DE per PET, we (a) create a temp Array with 1 DE per PET; (b) create an aliased version of this Array that treats all dimensions as distributed, saving the result of (a) as
temp_array_undist_p
; (c) Read into the Array created in (b); (d) do a redistStore and then redist fromtemp_array_undist_p
into the original (user) Array. I think we can rely ontemp_array_undist_p
being filled properly with data by the Read intotemp_array_p
because of the reference sharing betweentemp_array_undist_p
andtemp_array_p
(and my testing gives me more confidence in this), but I want to check to make sure you don't see any edge cases where this logic will fail. (I had originally hoped that I could do the redistStore and redist directly fromtemp_array_p
back to the original (user) Array, but I wasn't too surprised to find that this failed, with the message given in the above traceback.)(2) I did this testing with a standalone program (https://github.com/billsacks/esmfprojects-multi_tile_io/blob/main/esmApp.F90), not a unit test. I plan to add unit testing of this complex case (Array / Field with both ungridded dimensions and 0 or > 1 DE per PET on some PETs) in my multi-tile IO unit tests. My question is: do you feel I should also add unit tests of this combination on single-tile Arrays / Fields – probably in ESMF_ArrayIOUTest and ESMF_FieldIOUTest? It seems like we'll get the necessary code / logic coverage through the multi-tile tests, but the advantage of testing this combination with single-tile Arrays / Fields is that, if the test fails in the future, it would allow distinguishing between an issue with multi-tile vs. an issue that also impacts single-tile. The downside is that I think setting up these tests will be a bit complex, so adding single-tile versions of these tests will mean more test code to write and maintain. I can go either way on this myself; do you have feelings?